High precision in epileptic seizure self-reporting with an app diary

People with epilepsy frequently under- or inaccurately report their seizures, which poses a challenge for evaluating their treatment. The introduction of epilepsy health apps provides a novel approach that could improve seizure documentation. This study assessed the documentation performance of an app-based seizure diary and a conventional paper seizure diary. At two tertiary epilepsy centers patients were asked to use one of two offered methods to report their seizures (paper or app diary) during their stay in the epilepsy monitoring unit. The performances of both methods were assessed based on the gold standard of video-EEG annotations. In total 89 adults (54 paper and 35 app users) with focal epilepsy were included in the analysis, of which 58 (33 paper and 25 app users) experienced at least one seizure and made at least one seizure diary entry. We observed a high precision of 85.7% for the app group, whereas the paper group’s precision was lower due to overreporting (66.9%). Sensitivity was similar for both methods. Our findings imply that performance of seizure self-reporting is patient-dependent but is more precise for patients who are willing to use digital apps. This may be relevant for treatment decisions and future clinical trial design.


Data collection
During their stay in the EMUs of the tertiary epilepsy centers, participants had to choose between two methods to self-report their seizures: either a paper-based or an app-based seizure diary (Helpilepsy; Neuroventis, Belgium) installed on their smartphones.Both diaries were based on questionnaires in the form of free-text forms (see Supplementary Material Fig. S1).For the current study only the seizure diary related questions from both the paper and app-based diaries were used.Participants using the paper form were not reminded to report their seizures, while participants using the app diary received a push notification each day at 6 PM to complete this questionnaire.The paper questionnaires could be continuously edited by the participants during the day, conversely, once the app-based group had completed the questionnaire, they could no longer change their answers.Nevertheless, they could additionally self-report seizures through the app's standard seizure diary feature by entering details such as date, time, and seizure type, and use this information to fill out the questionnaire.

Dataset creation
All diary entries were first manually assigned by L.S. and N.Z. to their corresponding electroclinical seizures (classified according to International League Against Epilepsy (ILAE) guidelines 19 ) on the video-EEG, based on time information.For the assignment every reasonable temporal difference between the diary entry and its corresponding seizure timestamp was accepted.Diary entries that were correctly assigned to their seizures were considered true positives (TPs).Diary entries that could not be matched to a video-EEG documented seizure were considered false positives (FPs).Seizures documented with video-EEG, for which no diary entry was available were considered false negatives (FNs).All diary entries that were not relevant to seizures in the specific temporal context (e.g."I am not feeling well today") were ignored.For every participant, information about age, sex, prior diary experience, anti-seizure medication (ASM), duration of their stay in the EMU, and the number of their seizures and diary entries were extracted.Further, for every seizure, details such as seizure type, duration, originating hemisphere, involved lobes, and preictal awake/asleep state were extracted.
Participants were eligible for inclusion if they were adults with a diagnosed form of focal epilepsy, who filled out an app-based or paper-based seizure diary by themselves while participating in the SeizeIT2 study in Leuven or Freiburg and their diary was available.Additional inclusion criteria were having at least one seizure during their stay in the EMU while participating in the study OR/AND having made at least one (seizure-relevant) diary entry.Excluded were participants with generalized epilepsy due to very low documentation sensitivity 18 , participants supported by caregivers due to expected oversensitivity and children as they were under parental supervision.

Assessment of performance metrics of seizure diaries
The performances of participants of both groups were assessed using metrics based on the concept of standards for testing and clinical validation of seizure detection devices 20 , namely: At the level of seizures, the metrics were calculated by including all values independently, regardless of the participants.Further, at the level of participants, metrics were determined on a participant-specific basis, and subsequently, the arithmetic mean and standard deviation were computed across all participants.Participant-level metrics are therefore independent of the total number of seizures and diary entries of a participant.Sensitivity, requiring at least one seizure occurrence, and precision, requiring at least one diary entry, were calculated only if a participant fulfilled both criteria.F1-score and FAR24 were also calculated for participants who met only one of these criteria.

Statistical analysis of potential influencing parameters
The statistical association between meta parameters and the performance metrics were investigated to find possible influencing factors, for the overall cohort, and for the specific groups.Differences in proportions of meta parameters between the app and the paper group were analyzed.
At the level of seizures, the preictal awake/asleep state of the participant, level of awareness (according to the seizure classification by the ILAE), seizure duration, hemispheric lateralization, and lobar origin were investigated.Since this information was only present for seizures (TPs, FNs) and not for reported non-seizures (FPs), all the analyzed associations at seizure level only reflect their influence on sensitivity, not on precision.At the level of participants, age, sex, diary experience, ASM, and the seizure frequency were investigated for potential effects on sensitivity and precision of seizure documentation.
Nonparametric statistical testing was applied (Mann-Whitney U test, Spearman rank correlation, Chisquared test [χ 2 ]) and significance levels were adjusted for multiple comparisons using the Bonferroni correction.

Dataset
Figure 1 illustrates the consecutive filtering steps leading to the final dataset, and a subset that meet the necessary criteria for calculating the metrics.The dataset, including 89 participants with at least one seizure OR at least one diary entry made, comprises a total of 310 seizures and 245 diary entries.The subset, including 58 participants with at least one seizure AND at least one diary entry made, contains 247 seizures and 198 diary entries.

Performance metrics
The numbers of True Positives (TPs), False Positives (FPs), and False Negatives (FNs) are shown in Table 1, and the resulting performance metrics are presented in Table 2.For the complete dataset, sensitivities were below 50%, independent of the method, while for the app group, precision at seizure level was high (79.5%).Within the subset at seizure level the app group had an overall higher precision (85.7% vs. 66.9%), higher F1-score (69.8% vs. 63.3%), and lower FAR24 (0.09 vs. 0.31) compared to the paper group.Sensitivities were very similar (app group: 58.9%, paper group: 60.0%).At participant level, the same phenomena were observed within the subset, however, the app group reported with a higher average sensitivity of 76.5% compared to the paper group (69.9%).Moreover, individuals showed a high precision for both methods (app: 88.7%, paper: 82.7%), whereas we observed a low variation between the individuals in the app group (± 19.9%) and high variation (± 32.2%) Figure 1.Dataset creation.This diagram shows the consecutive steps performed to filter the data for the analysis.After excluding (in red) participants (P) without diaries or those with diaries filled out by caregivers, the final two datasets comprised individuals with focal epilepsy who self-reported their seizures (Sz) through diary entries (E) either in the app or on paper.These individuals had at least one seizure (not seizure free) OR made at least one diary entry (active use of the diary), i.e., the dataset, or had at least one seizure AND made at least one diary entry, i.e., the subset.The average duration (in hours) and standard deviation of the participants' stay in the EMU can be found in brackets of the two sets and their subgroups.in the paper group.Figure 2 reveals that 25.8% (23/89) of all participants in the dataset had unreported seizures only, while 9.0% (8/89) of the participants reported only seizures without video-EEG correlate.These 34.8% (31/89, marked in gray) of the participants had to be excluded, due to the inability to calculate sensitivity and precision.The remaining 65.2% of the participants (58/89) comprising the subset, experienced at least one seizure (i.e. were not seizure free) and made at least one diary entry (i.e.used the diary actively).These 58 participants were categorized into perfect documenters (F1-score equals 100%, 24/58, marked in green) and non-perfect documenters (F1-score < 100%, 34/58, marked in red).No patient in the group of the perfect documenters had more than 5 seizures, while the group of non-perfect documenters had a highly variable number of seizures ranging from 1 up to 30.Of the 34 non-perfect documenters, 26 were underreporting, i.e., participants which were reporting less than their actual seizure count.Of these 26 underreporting non-perfect documenters, 17 reported no FPs, 7 participants reported FPs and TPs, and two reported only FPs.The remaining 8 participants among the non-perfect documenters were overreporters, i.e., participants reporting more seizures than their actual number of seizures.Classified into the app and paper group (subset), underreporters who reported only FPs were solely observed in the paper group (n = 2, 6%).In the app group, there were 5 (20%) underreporters reporting FPs and TPs, while there were two (6%) in the paper group.Furthermore, there were two (8%) overreporting non-perfect documenters in the app group, while there were 6 (18%) in the paper group being responsible for 76.6% (36/47) of its group FPs.

Influencing factors
All statistical tests were applied on the subset (see Supplementary Material Tables S7-10).

Seizure level
We observed a significantly lower number of seizures documented in the seizure diaries if they happened while the participants were asleep compared to when they were awake (44.3% [47/106] vs. 70.9%[100/141], χ 2 1 = 16.66 , p < 0.001).This association was present in both groups, but was significant only in the app group (72.9% [54/74] vs. 49.6%[12/38], χ 2 1 = 16.11 ,p < 0.001), and not in the paper group (68.7% [46/67] vs. 51.5% [35/68], χ 2 1 = 3.47 , p = 0.063).The level of awareness during seizures was not associated with the number of reported seizures.Note that 50 seizures were excluded from this analysis because the level of awareness could not be determined according to the guidelines.Furthermore, we found no association of the number of reported seizures with their origin, lateralization or duration.
Between the groups, the preictal awake/asleep state, the seizure origin, and the seizure duration did not differ statistically.However there were fewer seizures with preserved awareness in the app group than in the paper = 13.59 , p < 0.001) and there were more seizures originating from the left hemisphere (76.4% [55/72] vs. 41.8%[38/91], χ 2 1 = 18.27 , p < 0.001) in the app group than in the paper group.

Participant level
Seizure frequency had a significant negative correlation with sensitivity (Spearman rank correlation, r = −0.63 , p < 0.001, n = 58) but not with precision (Spearman rank correlation,r = −0.17, p = 0.212, n = 58).The signifi- cant negative correlation of seizure frequency with sensitivity was evident in both groups (app: [Spearman rank correlation, r = −0.60 , p < 0.001, n = 25], paper: [Spearman rank correlation, r = −0.62 , p < 0.001, n = 33]).No significant associations were found between sensitivity or precision with the participants' age, sex, diary experience or ASM, neither in the overall cohort, nor in the groups.Further none of these parameters did differ between the groups.

Discussion
We found that seizures were reported with high precision by individuals with focal epilepsy who used an appbased seizure diary.Seizures reported in paper diaries were associated with a low precision, as some individuals tended to overreport their seizure activity.In addition, the study revealed that people with epilepsy with a higher seizure frequency showed a lower documentation sensitivity.Our study also corroborates previous research which found moderate sensitivity (approximately 50%) of seizure diaries and an effect of preictal awake/asleep state on sensitivity.

App-based diaries associated with high self-reporting precision
Sensitivity at seizure level was around 60% for both methods which aligns with prior studies 7 .However, individuals of the app group showed higher sensitivity, despite the fact that these patients more frequently had focal unaware seizures and seizures originating from the left hemisphere (both parameters which have been reported to be associated with a low self-reporting performance).Importantly, our study demonstrated that reported seizures are associated with a higher precision (85.7%) for the app-based diary and a lower precision when paper was used (66.9%).This finding aligns with studies on diaries for headaches and pain, which showed significantly fewer errors in digital forms compared to paper forms 21,22 .While at participant level individuals showed a high precision for both methods, the higher variation in the paper group stems from the non-perfect documenters with a tendency to overreport their actual seizure count (6 participants).These findings suggest that patients who are willing to use an app for seizure documentation, it may offer a more reliable option.Furthermore, FAR24 indicated only one false reported seizure every two weeks on average, highlighting the app's usefulness for clinical trials.Statistical analysis did not reveal differences between the groups at participant level, i.e. in age, sex, prior diary experience, seizure frequency, or medication.

Impact of overreporting in seizure self-reporting
There were six patients in the paper group who had considerably overreported their seizure frequencies meaning a low self-reporting precision and were responsible for 76% of their groups total FP count.Therefore, understanding the mechanisms of how and why non-seizures (FPs) are reported, is crucial to comprehend self-reporting precision.Prior research has predominantly focused on sensitivity of seizure diaries, whereas the number of FPs and precision has been barely explored in relation to overreporting.
Multiple studies noted a significant disparity between clinical and documented events in adult patients with focal epilepsy, concluding that most patients underestimate their seizures [23][24][25] .However, Pizarro et al. 26 and Elmali et al. 27 , both focused on generalized seizures and reported that 57% and 37.5% of their cohorts, respectively, were overreporting.Lacking objective measures, the authors hypothesized that over-attentiveness due to patients' or caregivers' insecurity in missing potential seizures in the EMU might be responsible.Identifying factors contributing to FPs in our study, independent of the method, was challenging due to the absence of meta parameters for non-seizure events: participants might have had limited knowledge of seizure semiology, perceiving trivial feelings and movements as seizures 28 .From our data, it indeed appears that uncertain subjective experiences might have served as potential indicators for seizures.Further, hospitalization could have induced hypervigilance and over-sensitivity 26,27 .
In contrast, in an outpatient setting, Cook et al. 25 found overreporting as part of the reason for a lack of correlation of self-reported and objectively documented seizures as assessed by mobile intracranial EEG.Recently, Hannon et al. 29 observed seizure overreporting in a large outpatient cohort.Specifically in patients with diagnosed focal epilepsy 43% of the reported events were FPs, which aligns with our findings (40% FPs in the overall cohort).These two studies strongly suggest that overreporting is not an artifact limited to the conditions of inpatient monitoring.
Patient education regarding seizure semiology, coupled with a neurologist's careful explanation of typical clinical signs, could significantly enhance self-reporting precision.In fact, in our cohort, one reported FP was classified as a psychological non-epileptic seizure (PNES).More extensive description of events in the diary might allow to distinguish PNES from epileptic seizures based on certain features.Further, this observation also elucidates the fact that certain epileptic seizures, characterized by a clear epileptic clinical pattern, can lack an electrographic correlate in surface EEG signals, particularly evident in focal seizures 30 .Finally, the diary form may have influenced the participants' tendency to overreport, especially in our study setting with a short inpatient stay.Reporting seizure activity with a hastily written paper note can leave significant room for errors.Conversely, utilizing a new app with its comprehensive range of functions and the associated mental effort, may have encouraged participants to focus more on accurate self-reporting, thereby reducing the margin for error.

Sensitivity is influenced by awake/asleep state and seizure frequency
We observed an overall self-reporting sensitivity of 59.5%.Poor documentation sensitivity, indicating underreporting, is primarily attributed to seizure-induced lack of awareness and nocturnal seizures 7 .Schulze-Bonhage et al. 8 recently confirmed circadian effects of underreporting seizures, aligning with our findings.Furthermore, Kerling et al. 31 had already shown that seizures during sleep are significantly less reported compared to awake seizures.Unlike Hoppe et al. 23 , our study did not find a significant association between the level of awareness and the number of reported seizures.This may be due to the variations in the definitions of awareness in the guidelines used 28 , and to the limited number of seizures with information on the state of awareness.Furthermore, multiple studies revealed a negative effect on documentation sensitivity from seizures originating from the left temporal lobe [31][32][33] , which was not found in the seizures analyzed for this study.Interestingly, although the app group was associated with more seizures originating from the left hemisphere and impaired awareness-both factors linked to underreporting-a lower sensitivity was not found.This possibly indicates an overall positive effect of the app on documentation sensitivity.
At the participant level, we found a negative correlation between seizure frequency and sensitivity.Higher seizure frequency leading to lower sensitivity may be attributed to a 'diary fatigue effect' 34 .This effect becomes evident when comparing seizure counts between perfect documenters and non-perfect documenters.It further implies a careful interpretation of reports from patients with high seizure counts when valid documentation is critical, e.g. the outcome of a patient's treatment or the definition of inclusion criteria for clinical trials.

Pro and cons of seizure diaries: future directions
Study outcomes suggest that self-reporting seizures using an app may be associated with a high precision, particularly for patients who are willing to use digital apps.Although generally people with epilepsy show interest in documenting seizures with apps 35 , the actual utilization is still low 36 .Explaining the benefits of long-term use and providing training 37,38 and education 39,40 on these tools, might increase utilization (see Supplementary Material).Our data may thus support the recommendation of apps for seizure documentation in clinical practice and particularly in clinical trials.Besides our findings, app-based diaries show several other benefits.These are, among others, better readability, increase in temporal accuracy due to predefined time stamps 12 , a lower risk of losing data, and enhanced accessibility for health care providers 14 .Further, systematically collected data would allow for personalized, data-driven therapy with the help of automatic analysis tools.This technology might be useful for clinical and research purposes, serving as a patient-reported outcome and aiding in the evaluation of medication.Specifically in therapeutic trials, research has already shown that high app-based diary compliance can be achieved 41 .
Nevertheless, the use of seizure diaries to self-report seizures lacks sensitivity 7 , as recently also shown for ultra-long periods in the outpatient setting 42 .Further, the performance depends on the individual patient and even with a high diary compliance, self-reporting is still affected by seizure unawareness.One way to overcome this challenge is through the utilization of wearable seizure detection devices that can objectively measure seizure signs based on biosignals such as electrocardiogram, acceleration, or low-channel EEG 9 .Looking ahead, an optimal solution may involve the integration of app-based diaries into wearable seizure detection systems 7 .This combined approach has the potential to enhance sensitivity and minimize the number of false positives (FPs), ultimately leading to an overall improved performance in seizure documentation.Moreover, objective seizure detection based on biosignals can be complemented with patients' descriptions of precipitating factors to their seizures and even possible videos of the event taken by a caregiver, harnessing the full potential of these methods.

Limitations
Given the retrospective nature of this study, it inherently entails certain limitations.First, participants were not randomly assigned to one of the two diaries; instead, they were allowed to self-select their preferred diary.As a result, the study allowed the identification of associations but precluded the establishment of causal inferences.Whereas letting patients choose the documentation method may induce a selection bias, this reflects clinical practice and the self-reporting performance in either diary is not independent of the patient's preferences.Second, app diary results were based on questionnaire responses in the app, potentially affecting performance (see Methods).Besides, the daily reminders provided only to app users might have influenced results.However, Hoppe et al. 23 reported that such daily reminders do not necessarily have an influence on documentation performance, leaving it undecided whether this factor contributed to superior precision of the app group.Third, the controlled lab conditions of EMUs and changes in ASM dosages differ from real-life home environments and might have potentially influenced seizure reporting.Finally, generalizability is limited as only focal epilepsies were analyzed and as patients in whom caregivers supported seizure documentation were excluded from this study.Further research in larger datasets may also address possible effects of seizure clustering on their documentation, which was not included here.

Conclusion
In conclusion, app-based seizure diaries have the potential to enhance patients' performance in self-reporting seizures, particularly in terms of precision.Accurate seizure diaries allow neurologists to make better decisions in daily clinical care.Increased utilization of epilepsy health apps will expand databases, contributing to further validation of app-based approaches in assessing their relevance for treatment decisions and clinical trials.

Figure 2 .
Figure 2. Individual performances.The plots reveal the seizure number, TPs and FPs as well as the F1-score for each participant sorted by an increasing F1-score per participant (from left to right).The top panel (a) shows all seizure diaries in total, while the bottom panel is separated per diary form (b app, c paper).The gray-striped area highlights the participants who were excluded from further analysis (filtered out from dataset to subset).The green area highlights the perfect documenters (F1-score = 100%) and the red area highlights the non-perfect documenters (F1-score < 100%).

Table 1 .
Matching matrix showing the TPs, FPs, and FNs for the overall group, both diary forms, and both filter levels.

Table 2 .
Performance metrics calculated for seizure level and participant level on both filter levels.